Spoken Term Detection Using Spoken Document Index Based on Keywords Collected from Automatic Speech Recognition Result

نویسندگان

Kentaro Domoto

Takehito Utsuro

Hiromitsu Nishizaki

چکیده

This paper presents a novel spoken document indexing framework for Spoken Term Detection (STD). Our proposed method utilizes an STD method for making an index from keywords collected from outputs from automatic speech recognition systems. The STD method is conducted for all the keywords as query terms; then, the detection result, a set of each keyword and its detection intervals in the spoken document, is obtained. For the keywords that have competitive intervals, we rank them based on the matching cost of STD and select the best one with the longest duration among competitive detections. This is the final output of STD process and serves as an index word for the spoken document. The proposed framework was evaluated on real lecture speeches as spoken documents in an STD task. The results show that our framework was quite effective for preventing false detection errors and in annotating keyword indices to spoken documents. 

متن کامل

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Spoken Term Detection Using SVM-Based Classifier Trained with Pre-Indexed Keywords

This study presents a two-stage spoken term detection (STD) method that uses the same STD engine twice and a support vector machine (SVM)-based classifier to verify detected terms from the STD engine’s output. In a front-end process, the STD engine is used to preindex target spoken documents from a keyword list built from an automatic speech recognition result. The STD result includes a set of ...

متن کامل

Two-step spoken term detection using SVM classifier trained with pre-indexed keywords based on ASR result

This paper presents a novel two-step spoken term detection (STD) method that uses the same STD engine twice and a support vector machine (SVM)-based classifier to verify detected terms from the output of the second STD engine. In the first STD process, pre-indexing of the target spoken documents from a keyword list built from the results of automatic speech recognition of the speeches is perfor...

متن کامل

Combining State-level and DNN-based Acoustic Matches for Efficient Spoken Term Detection in NTCIR-12 SpokenQuery&Doc-2 Task

Recently, in spoken document retrieval task such as spoken term detection (STD), there has been increasing interest in using a spoken query. In STD systems, automatic speech recognition (ASR) frontend is often employed for its reasonable accuracy and efficiency. However, out-of-vocabulary (OOV) problem at ASR stage has a great impact on the STD performance for spoken query. In this paper, we pr...

متن کامل

Spoken Term Detection Using Phoneme Transition Network from Multiple Speech Recognizers' Outputs

Spoken Term Detection (STD) that considers the out-of-vocabulary (OOV) problem has generated significant interest in the field of spoken document processing. This study describes STD with false detection control using phoneme transition networks (PTNs) derived from the outputs of multiple speech recognizers. PTNs are similar to subword-based confusion networks (CNs), which are originally derive...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Spoken Term Detection Using Spoken Document Index Based on Keywords Collected from Automatic Speech Recognition Result

نویسندگان

چکیده

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Spoken Term Detection Using SVM-Based Classifier Trained with Pre-Indexed Keywords

Two-step spoken term detection using SVM classifier trained with pre-indexed keywords based on ASR result

Combining State-level and DNN-based Acoustic Matches for Efficient Spoken Term Detection in NTCIR-12 SpokenQuery&Doc-2 Task

Spoken Term Detection Using Phoneme Transition Network from Multiple Speech Recognizers' Outputs

عنوان ژورنال:

اشتراک گذاری